Unlabeled Data Does Provably Help
نویسندگان
چکیده
A fully supervised learner needs access to correctly labeled examples whereas a semi-supervised learner has access to examples part of which are labeled and part of which are not. The hope is that a large collection of unlabeled examples significantly reduces the need for labeled-ones. It is widely believed that this reduction of “label complexity” is marginal unless the hidden target concept and the domain distribution satisfy some “compatibility assumptions”. There are some recent papers in support of this belief. In this paper, we revitalize the discussion by presenting a result that goes in the other direction. To this end, we consider the PAC-learning model in two settings: the (classical) fully supervised setting and the semi-supervised setting. We show that the “label-complexity gap” between the semi-supervised and the fully supervised setting can become arbitrarily large for concept classes of infinite VC-dimension (or sequences of classes whose VC-dimensions are finite but become arbitrarily large). On the other hand, this gap is bounded by O(ln |C|) for each finite concept class C that contains the constant zeroand the constant one-function. A similar statement holds for all classes C of finite VC-dimension. 1998 ACM Subject Classification I.2.6 Concept Learning
منابع مشابه
Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning
We study the potential benefits of unlabeled data to classification prediction to the learner. We compare learning in the semi-supervised model to the standard, supervised PAC (distribution free) model, considering both the realizable and the unrealizable (agnostic) settings. Roughly speaking, our conclusion is that access to unlabeled samples cannot provide sample size guarantees that are bett...
متن کاملArtemia: a family of provably secure authenticated encryption schemes
Authenticated encryption schemes establish both privacy and authenticity. This paper specifies a family of the dedicated authenticated encryption schemes, Artemia. It is an online nonce-based authenticated encryption scheme which supports the associated data. Artemia uses the permutation based mode, JHAE, that is provably secure in the ideal permutation model. The scheme does not require the in...
متن کاملLearning Safe Prediction for Semi-Supervised Regression
Semi-supervised learning (SSL) concerns how to improve performance via the usage of unlabeled data. Recent studies indicate that the usage of unlabeled data might even deteriorate performance. Although some proposals have been developed to alleviate such a fundamental challenge for semisupervised classification, the efforts on semi-supervised regression (SSR) remain to be limited. In this work ...
متن کاملStatistical Analysis of Semi-Supervised Regression
Semi-supervised methods use unlabeled data in addition to labeled data to construct predictors. While existing semi-supervised methods have shown some promising empirical performance, their development has been based largely based on heuristics. In this paper we study semi-supervised learning from the viewpoint of minimax theory. Our first result shows that some common methods based on regulari...
متن کاملActive Learning : Bandits
In supervised learning, the goal of the learning algorithm is to learn a predictor, h : X → Y , which accurately predicts labels of future instances. In the traditional PAC learning model, the learner receives a training set of examples which are sampled i.i.d. from an unknown distribution D over X × Y . The learner has no control on which examples he receives. In active learning the learner in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013